-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't reuse RandomState seeds #37470
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
@rfcbot fcp merge Looks good to me, thanks @arthurprs! Want to run it by @rust-lang/libs just to be super sure though we're on board with this solution to #36481 |
Team member @alexcrichton has proposed to merge this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
I would not describe this as a solution to #36481. Performance is still unnacceptable with non-default hashers. It does work around the issue in the common case, and we should land it though. |
Do you think I need to reword the comment mentioning #36481? |
let r = rand::OsRng::new(); | ||
let mut r = r.expect("failed to create an OS RNG"); | ||
(r.gen(), r.gen()) | ||
UnsafeCell::new((r.gen(), r.gen())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this UnsafeCell
can be Cell
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I'll change it.
The in-code comment is fine to me, but I changed |
d70340c
to
eba93c3
Compare
I agree this is a good and efficient interim solution. To be clear, how much confidence do we have that addition here is actually a mathematically valid way to ensure that merging two hashmaps won't collide? My understanding is that it is clearly better than the status quo, that we suspect that in practice it will prevent collisions, but there's not any mathematical basis for knowing it will be effective (i.e. some clever person may find a way to generate collisions by incrementing the initialization value). |
Here are the benchmarks comparing this approach of incrementing this value to completely regenerating it. This approach is 2ns vs 6-15ns. |
The fix relies on the backing hash function being high-quality, as in "Every input bit should affect every output bit about ~50% of the time" (seed is also an input). |
🔔 This is now entering its final comment period, as per the review above. 🔔 psst @alexcrichton, I wasn't able to add the |
@bors: r+ |
📌 Commit eba93c3 has been approved by |
Don't reuse RandomState seeds cc rust-lang#36481
Don't reuse RandomState seeds cc rust-lang#36481
⌛ Testing commit eba93c3 with merge b243964... |
💔 Test failed - auto-win-msvc-64-opt-rustbuild |
@bors: retry On Fri, Nov 4, 2016 at 10:25 PM, bors notifications@github.com wrote:
|
Don't reuse RandomState seeds cc #36481
Don't reuse RandomState seeds cc rust-lang#36481
@arthurprs The seed is not an input. The seed describes a choice of a function from a family of functions. Regardless, the seed and the input are mixed in a similar way. Perhaps the solution is effective. I don't know cryptography. I can only read what cryptographers have explained. |
Shouldn't |
But what if change order of iteration just will fix original issue? We can use just simplest lcg generator for walking order:
|
@sfackler points out that given output of iteration one may construct new input values in desired order to DoS the application. Ok, lcg is very tunable, so we can show different order on every iteration:
|
@funny-falcon the problem with that approach is that the performance hit for iteration is possibly large. It should be easily measured though. |
@arthurprs you right: cache misses on big hashtables will destroy performance. So, my suggestion is not acceptable. |
Quoting the SipHash paper, pp. 5-6 (my boldface):
In the patch attached to this pull request, the One candidate solution that does not fall afoul of the random keys precondition would be:
This way the incrementing counter is part of the message, not the seed, which is fine because the PRF security claim allows the even more adverse case of "messages chosen adaptively by the attacker." With thread-local seeds, the unique ID can be an additional |
I think, it is a time to ask Jean-Phillipe Aumasson @veorq : does SipHash allows incrementing key, or will it fail? even though attacker never sees whole output of hashsum (it can only guess low bits, and even those bits are not exact cause of collision resolution)? |
As noted above we didn't make claims in the case of "related keys", but I haven't seen attack in the related-key model. TL;DR: I think it's ok. Details: IIUC the scenario here is the following:
The attacker can choose what messages are hashed. We can assume that the attacker sees the hashes of the messages with Given the previously published cryptanalysis results on SipHash, I'm confident that SipHash will remain secure under related keys and as used in Rust. |
Adaptive hashmap implementation All credits to @pczarn who wrote rust-lang/rfcs#1796 and contain-rs/hashmap2#5 **Background** Rust std lib hashmap puts a strong emphasis on security, we did some improvements in #37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481). This is a simplified version of rust-lang/rfcs#1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself. **Proposal** This PR adds code checking for extra long probe and shifts lengths (see code comments and rust-lang/rfcs#1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case. We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe. **Drawbacks** * May interact badly with poor hashers. Maps using those may not use the desired capacity. * It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches. * May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor. **Example** Example code that exploit the exposure of iteration order and weak hasher. ``` const MERGE: usize = 10_000usize; #[bench] fn merge_dos(b: &mut Bencher) { let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect(); let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect(); b.iter(|| { let mut merged = first_map.clone(); for (&k, &v) in &second_map { merged.insert(k, v); } ::test::black_box(merged); }); } ``` _91 is stdlib and _ad is patched (the end capacity in both cases is the same) ``` running 2 tests test _91::merge_dos ... bench: 47,311,843 ns/iter (+/- 2,040,302) test _ad::merge_dos ... bench: 599,099 ns/iter (+/- 83,270) ```
cc #36481